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Abstract 

Background: It is usually possible to identify the sex of a pre-pubertal child from their voice, despite the absence of sex 
differences in fundamental frequency at these ages. While it has been suggested that the overall spacing between formants 
(formant frequency spacing - AF) is a key component of the expression and perception of sex in children's voices, the effect 
of its continuous variation on sex and gender attribution has not yet been investigated. 

Methodology/Principal findings: In the present study we manipulated voice AF of eight year olds (two boys and two girls) 
along continua covering the observed variation of this parameter in pre-pubertal voices, and assessed the effect of this 
variation on adult ratings of speakers' sex and gender in two separate experiments. In the first experiment (sex 
identification) adults were asked to categorise the voice as either male or female. The resulting identification function 
exhibited a gradual slope from male to female voice categories. In the second experiment (gender rating), adults rated the 
voices on a continuum from "masculine boy" to "feminine girl", gradually decreasing their masculinity ratings as AF 
increased. 

Conc/usions/Significance:These results indicate that the role of AF in voice gender perception, which has been reported in 
adult voices, extends to pre-pubertal children's voices: variation in AF not only affects the perceived sex, but also the 
perceived masculinity or femininity of the speaker. We discuss the implications of these observations for the expression and 
perception of gender in children's voices given the absence of anatomical dimorphism in overall vocal tract length before 
puberty. 

Citation: Cartei V, Reby D (2013) Effect of Formant Frequency Spacing on Perceived Gender in Pre-Pubertal Children's Voices. PLoS ONE 8(12): e81022. 
doi:10.1 371/journal.pone.0081 022 

Editor: Howard Nusbaum, The University of Chicago, United States of America 
Received July 1, 2013; Accepted October 10, 2013; Published December 3, 2013 

Copyright: © 2013 Cartei, Reby. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 

Funding: These authors have no support or funding to report. 

Competing Interests: The authors confirm that David Reby serves as an academic editor for this journal. This does not alter their adherence to all the PLOS ONE 
policies on sharing data and materials. 

* E-mail: val.cartei@sussex.ac.uk 



Introduction 

Adults can discriminate the sex of adult [1] and of children [2,3] 
speakers by listening to their voice only. Sex identification in adult 
voices is substantially determined by acoustic differences in 
fundamental frequency (F0) and in the overall pattern of formant 
frequencies (AF, or formant spacing), which in turn reflect 
anatomical dimorphisms in the vocal apparatus between the two 
sexes. During male puberty, the testosterone-related growth of the 
laryngeal cartilages [4-6], and the associated lengthening and 
stiffening of the vocal folds [7,8] cause men's F0 to drop by almost 
50% compared to women's (men's F0: 120 Hz; women's: 200 Hz 
[8]), conferring men their characteristically lower-pitched voices. 
Moreover, the testosterone-induced differential body height, with 
men being on average 7% taller than women [9], coupled with the 
male-specific secondary descent of the larynx [10], result in men 
having longer vocal tracts and thus narrower AF (15-20% [11,12]) 
than women, conferring a disproportionately more baritone 
quality to the male voice [10]. 

The voices of pre-pubertal children are also acoustically and 
perceptually different, and perceptual studies show that adults are 
able to correctly identify gender from the voice in children as 
young as four [3] . Several acoustic investigations have shown that, 



while children of both genders speak with similar FOs ([13-15]; but 
also see [16]) boys speak with lower formants and consequendy 
narrower AF than girls [2,3,13,14,17,18] despite the absence of 
overall differences in vocal tract length between the two sexes 
before puberty [10,19-21]. This dimorphism has led to the 
suggestion that pre-pubertal sex differences in AF have a 
behavioural basis (for example boys may round their lips or lower 
their larynx when they speak to lengthen their vocal tracts — 
[2,14]). 

Taken together, these studies indicate that the between-sex 
dimorphism in the voice frequency characteristics (AF only in 
children and both AF and F0 in adults) is perceptually relevant to 
categorize the sex of speakers. Moreover, at least in adult voices, 
between-speaker variation in these parameters appears to also 
influence the perception of gender, a term which encompasses the 
biological and social attributes which a given society deems typical 
of either male (masculine attributes) or female (feminine attributes) 
sex [22]. For example, listeners consistently rate adult voices with 
naturally or artificially lower F0, lower AF, or both, as belonging 
to more masculine individuals than their raised versions [23,24]. 
While variation in F0 and AF, which are both sexually dimorphic 
in adult voices, has been shown to influence listeners' attributions 
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of adults' sex and gender characteristics, to our knowledge the 
effect of naturalistic variation in AF on sex and gender attributions 
has not been investigated in children's voices, despite the fact that 
this trait is sexually dimorphic. 

Here we investigate whether small increments of AF in 
children's voices affect sex (male, female), as well as gender 
(masculine, feminine) attributions by adult listeners. In the first 
experiment (sex identification) we resynthesize AF along gender 
continua within the observed natural variation of this parameter 
and ask listeners to identify the sex of the speakers. We expect the 
identification function to be characterized by a gradual change 
from the male to the female category. In the second experiment 
(gender rating), we ask listeners to rate each voice stimulus on a 
scale that combines sex and gender information (from "masculine 
boy" to "feminine girl"). We expect that small, consecutive 
increments in AF will elicit a gradual increase in listeners' ratings 
from "masculine boy" to "feminine girl". 

Materials and Methods 

Ethics statement 

Written consent from children's guardians as well as verbal 
consent from children were obtained prior to the recording of the 
voice stimuli. All adult subjects taking part in the psychoacoustic 
experiments gave written informed consent. Both procedures 
(voice recording and psychoacoustic experiments) were reviewed 
and approved by the Ethics Committee of the University of Sussex 
(authorization codes: DRVC0709 and DRVC071 1). 

Subjects 

252 second-year Psychology students (74 males, 178 females) 
from Sussex University took part in the psychoacoustic experi- 
ments (as part of their practical coursework in a Cognitive 
Psychology level two module). All subjects were fluent English 
speakers. 

Stimuli 

Speech utterances were recorded using a Shure SM94 micro- 
phone and a Tascam DR07mkII handheld recorder at a primary 
school in Sussex, as part of a previous study of gender expression 
in children's speech. During these recordings, two girls and two 
boys aged eight were asked to read out seven short words ("bed", 
boot", "book", "box", "duck", "hat", "pig"). The recorded single- 
syllable words were individually standardized to 65 dB and 
concatenated prior to acoustic analysis and resynthesis. 

Acoustic analyses 

We extracted FO and formant frequencies using PRAAT 
v.5.1.19 freeware [25]. FO was extracted using the command 'to 
Pitch', with analysis parameters set to: time-step 0.01 s; pitch floor, 
60 Hz; pitch ceiling, 500 Hz. The frequency values of the first 
three formants (F 1; F 2 , F 3 ) were extracted using linear predictive 
coding (LPC) via the 'LPC: To Formants (Burg)' command, with 
analysis parameters set to: maximum number of formants, 5; 
maximum formant frequencies, 6000-6600 Hz; window of 
analysis, 0.025 s. Formant spacing ((1) AF = F i+1 - F ; ) was derived 
from F!-F 3 values, by modelling the vocal tract as a uniform tube 
closed at the glottis and open at the mouth [26,27]. Under such 
model, F; are expressed as: 

v ; AVTL 



Where i is the formant number, c is the speed of sound in a 
mammal vocal tract (35,000 cm/ s), VTL is the vocal tract length 
(in cm) and F; is the frequency (in Hz) of z'fh formant. From (1) and 
(2), it follows that AF = F ;+1 - Fi = r/2VTL (3). By replacing cl 
2 VTL with AF in equation (2), AF can be derived as the slope of a 
regression model with the observed i 7 ; values (y-axis) plotted 
against the expected formant positions: 

and the apparent vocal tract length (a VTL), as its inverse acoustic 
correlate measured in cm (aVTL = c/2AF). Therefore the longer 
the vocal tract, the lower the formant frequencies, and the 
narrower their overall frequency spacing. All extracted and 
derived acoustic values are reported in Table 1. 

Re-synthesis 

Following acoustic analysis, the stimuli were resynthesized using 
the "change gender" command in PRAAT. This command uses 
PSOLA, a resynthesis algorithm that allows the independent 
manipulation of formant frequency spacing (AF), mean funda- 
mental frequency (F0), F0 variation and signal duration while 
keeping the values of all the other acoustic parameters (amplitude, 
noisiness etc.) unchanged. The mean fundamental frequencies 
were all standardised to 260 Hz (the average F0 measured in our 
sample). In order to remove possible intonation cues to gender, F0 
variation was flattened by adjusting F0 values to the mean F0 (thus 
making the voice monotonous). Formant values were scaled up or 
down in increments of 2%, mimicking equivalent variations of AF 
(and thus aVTL) in speakers' voices. An increase of 2% of formant 
frequencies (achieved in the 102% stimuli) equates to a 2% 
increase in AF (corresponding to a 2% shortening of the vocal 
tract), and is expected to feminise the voice. As formant 
frequencies in our sample were on average 6% lower in the boy 
exemplars than in the girl exemplars, just below the gender 
difference reported in the literature for children of similar age (9— 
10% - [3,18]) male voices were rescaled from 88% to 1 18%, while 
female voices were rescaled from 82% to 112%. The resulting 
continua were therefore not identical, but largely overlapping: the 
boys' continuum ranged from 1526 Hz to 1 138 Hz (aVTLs from 
11.5 cm to 15.5 cm), while the girls' continuum ranged from 
1542 Hz to 1129 Hz (aVTLs from 11.4 cm to 15.5 cm). 
Supplementary online material includes audio files of example 
stimuli for one girl (Audio SI) and boy (Audio S2) exemplar. The 
resulting continua are within the range of AF variation observed in 
pre-pubertal children, as derived from published F1-F3 values 
[14], with aVTLs ranging from 1 1.4 cm to 15.9 cm for 5-12 year 



Table 1. Acoustic variables (F0, F,, AF in Hz) and apparent 
Vocal Tract Length (aVTL in cm) characterising the 4 
exemplars (measured on concatenated strings of CVC words). 





Exemplars FO 


F, 


F 2 


F 3 


AF 


aVTL 


Girl 1 


237 


921 


2125 


3381 


1383 


12.7 


Girl 2 


304 


859 


2099 


3370 


1372 


12.8 


Boy 1 


237 


786 


1933 


3175 


1283 


13.6 


Boy 2 


262 


768 


2015 


3194 


1302 


13.4 



Average AF was 1 377 Hz (avTL 1 2.7 cm) for the two girl exemplars and 1 293 Hz 
(aVTL 13.5 cm) for the two boy exemplars. 
doi:1 0.1 371 /journal.pone.0081 022.t001 
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old children. They are also consistent with anatomical variation 
reported in [10], where VTLs for boys and girls, measured during 
quiet respiration, varied from 9.7 cm at age 5 to 14.0 cm at age 
12. In summary, we generated 64 audio stimuli consisting of 16 re- 
synthesised variants of the single-syllable word lists by the two boys 
and the two girls. Figure 1 shows spectrograms of the vowel "?" 
spoken by one of the exemplars, in which the formants (dark bands 
of energy in the spectrogram) are shifted compared to the original 
signal, while signal duration, F0 and F0 variation remain 
unchanged. 

Procedure 

Participants completed the identification experiment first. 
Stimuli were presented using a PRAAT Multiple Forced Choice 
(MFC) experiment script and for each stimulus participants were 
asked to decide if the speaker was male or female (the instruction 
was: "Please identify the sex of the speaker") by clicking the 
respective button on the screen (labelled "male" or "female"). A 
total of different 64 stimuli (16 variants from four exemplars) were 
presented once in a pseudo-random order. Participants were given 
an opportunity to pause after each series of 32 presentations. This 
experiment lasted approximately 10 minutes. In the second 
experiment, participants were asked to rate the same 64 voice 
stimuli from the sex identification task (also presented in a pseudo- 
random order using a MFC experiment script). The instruction 
was: "Rate the voice of the speaker on a scale of 1 to 7" and 
buttons were labelled as 1 = masculine boy, 2 = boy, 3 = feminine 
boy, 4 = neutral, 5 = masculine girl, 6 = girl, 7 = feminine girl. 

Statistical analyses 

Because different sets of resynthesis variants (different formant 
scaling factors) were used for male and female exemplars, data are 
analysed and reported separately by exemplar's sex. 

In order to test the effect of stimuli variant and listener sex on 
sex identification, we ran Generalised Linear Mixed Models 
(GLMM) with stimuli variant (scale), listener sex (nominal) and 
their interaction as fixed factors, exemplar id and subject id as 
random factors, and sex identification score (0 = male, 1 = female) 

A Masculinised voice B Prototypical voice 



N 



as a binomial target variable. In order to test the effect of stimuli 
variant and listener sex on gender ratings we ran Linear Mixed 
Models (LMM) with stimuli variant (scale), listener sex (nominal) 
and their interactions as fixed factors, exemplar id and subject id 
as random factors, and gender rating as a scale outcome variable 
(from 1 = masculine boy to 7 = feminine girl). 

Simple logistic regressions (one for boy exemplars and one for 
girl exemplars) were then used to illustrate the relationship 
between formant frequency spacing and identified sex with 
average score (over all participants) as the dependent variable 
and stimuli variant as the independent variable. Logistic models 
provide estimates for the slope of the category (here 'male' to 
'female') transition (bl coefficient, ranging between 0 and 1, with 
lower values reflecting steeper transitions) [28-30] and for the 
perceived category boundary (where 50% of stimuli are catego- 
rised a male, and 50% as female). The category boundary was 
computed using the formula -Ln(b0)/Ln(bl) where bO is the 
constant of the logistic curve and b 1 is the coefficient related to the 
slope [30,31]. Simple linear regressions with stimuli variant as the 
predictor variable and average gender ratings (over all the 
participants) as the outcome variable were used to illustrate the 
relationship between formant frequency spacing variant and 
perceived gender. All the statistical analyses were performed using 
SPSS v.20.0. 

Results 

Sex identification experiment 

The results of the GLMM on sex identification scores of boy 
exemplars revealed a significant main effect of stimuli variant, 
Fi.s.060 — 2,696.66, p<.Q0\, while no significant main effects of 
listener's sex, Fi jj q 6 q = 2.50, ^ = .114, and of its interaction with 
stimuli variant, F 1>8 060 = 3.47, p = .063, were found. A logistic 
regression (Fig. 2 - black line) provided a strong statistical fit for 
the observed relationship between stimuli variant and average sex 
identification scores, R 2 = .95, F ljl4 = 240.43, p<.001. The rela- 
tively shallow transition (bl = .65) from one response category to 
the other indicates that the percentage of stimuli identified as 
female increases progressively as AF increases. Using this model, 

C Feminised voice 



JO 





Time (s) 



0.25 0 0.25 

Time (s) Time Is) 



Figure 1. Spectrograms of vowel "U" (from "book") created from girl exemplar 1. Spectrogram settings: window length = .025 s, 
maximum number of formants, 5; maximum formant frequencies, 6000-6600 Hz. The formants (labeled F1-F4) are shifted down by 18% (A) and up 
by 12% (C) in comparison to the original signal (B), while all other acoustic parameters, including fundamental frequency, remain unchanged. 
doi:1 0.1 371 /journal.pone.0081 022.g001 
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Resynthesis variant (%) 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 

Formant Spacing (Hz) 1138 1164 1190 1215 1241 1267 1293 1319 1345 1371 1396 1422 1448 1474 1500 1526 

Figure 2. Identification and rating scores of boys' voices along the gender continua. Scores were averaged across listeners on voice 
stimuli (numbered 1-16 on the x-axis) for the boys' exemplars. The mean identification scores are plotted from 0 = male to 1 = female (left y-axis) and 
fitted with the logistic curve (black line). The vertical lines illustrate the location of the estimated sex boundary (where 50% of the listeners rate the 
stimuli as female) and the location of the prototypical boy voice stimulus (1 00%). The percentage of stimuli identified as female follows an S-shaped 
pattern along the continuum of resynthesis variants. The sex identification curve is characterised by a lower plateau for stimuli 1 to 6 (AFs of 1 138- 
1267 Hz), where less than 10% of the stimuli are identified as female, indicating that stimuli variant with the lowest AF are mostly identified as male. 
The percentage of stimuli identified as female then increases gradually and linearly, and while no upper plateau is reached, average scores for stimuli 
14 to 16 (AFs of 1474-1526 Hz) varied from 76% to 85%, indicating that boys' voices with the highest AF are mostly classified as female. Average 
gender rating scores are plotted from 1 = masculine boy (or girl) to 7 = feminine boy (or girl) (right y-axis) and fitted with a linear function (straight 
grey line). Mean gender ratings of male voices ranged from 1.78 (SE = .07) for the lowest AF variants to 5.36 (SE = .08) for the highest AF variants. 
doi:1 0.1 371 /journal.pone.0081 022.g002 



the estimated "male-female" boundary fell between stimulus 1 1 
and 12 (-Ln(127.43)/(.65)= 11.25, where b0= 127.43 and 
bl=.65, corresponding to 1 08%— 1 10% variants or 
AF-1400 Hz). 

The results of the GLMM on sex identification scores of girl 
exemplars revealed a significant main effect of stimuli variant, 
Fi l 8.06o = 1,869.28, /;<.001, while no significant main effects of 
listener's sex, F 1 8 06 o= 1.99, /> = .158, and of its interaction with 
stimuli variant, Fig.060 = 2.04, p= .153, were found. A logistic 
regression (Fig. 3 - black line) provided a strong statistical fit for 
the observed relationship between stimuli variant and average 
identification scores, R 2 = .97, F ljH = 382.14, /><.00T. The rela- 
tively shallow transition (bl = .67) from one response category to 
the other indicates that the percentage of stimuli identified as 
female increases progressively as AF increases. Using this model, 
the estimated "male-female" boundary fell between stimulus 7 and 
8 (-Ln(17.37)/Ln(.67) = 7.13, where bO = 17.37 and bl = .67, 
corresponding to 94%-96% variants or AF~1300 Hz). 



Gender rating experiment 

The results of the LMM on gender ratings of boy exemplars 
revealed a significant main effect of stimuli variant, 
F15 778I — 692.41, p<. 001. No significant main effect of listener's 
sex > F 1250 = 2.24, p = .136, and of its interaction with stimuli 
variant, Fi j778 i = 1.136, p = .317, were found. The results of the 
LMM on gender ratings of girl exemplars revealed a significant 
main effect of stimuli variant, F 1 5 77i!1 = 626.87, p<.QQ\. No 
significant main effect of listener's sex, F t 250 = .196, p = .658, and 
of its interaction with stimuli variant, F] 7781 = .714, p = .773, were 
found. Simple linear regressions (Figures 2 and 3 - grey straight 
lines) provided strong statistical fits for the observed correlation 
between variant number and average gender rating scores, 
showing that scores increased (from masculine boy to feminine 
girl) as formant frequency spacing increased (male exemplars: 
R 2 = .99, F 1( 14 = 893.04, p<.00l, female exemplars: R 2 = .97, F h 
14 = 459.94, V<.001). 
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Resynthesis variant (%) 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 

Formant Spacing (Hz) 1129 1157 1184 1212 1239 1267 1294 1322 1349 1377 1405 1432 1460 1487 1515 1542 

Figure 3. Identification and rating scores of girls' voices along the gender continua. Scores were averaged across listeners on voice stimuli 
(numbered 1-16 on the x-axis) for the girls' exemplars. The mean identification scores are plotted from 0 = male to 1 = female (left y-axis) and fitted 
with the logistic curve (black line). The vertical lines illustrate the location of the estimated sex boundary (where 50% of the listeners rate the stimuli 
as female) and the location of the prototypical boy voice stimulus (100%). The percentage of stimuli identified as female also follows an S-shaped 
pattern along the continuum of resynthesis variants. The sex identification curve is characterised by a lower plateau for stimuli 1 to 3 (AFs of 1 129- 
1 184 Hz), where between 10% and 15% of the stimuli are identified as female, indicating that stimuli variant with the lowest AF are mostly identified 
as male. The percentage of stimuli identified as female then increases gradually and linearly until it reaches an upper plateau from stimuli 12 to 16 
(AFs of 1432-1542 Hz), with average scores varying from 92% to 95% and indicating that girl voices with the highest AF are mostly classified as 
female. Average gender rating scores are plotted from 1 = masculine boy (or girl) to 7 = feminine boy (or girl) (right y-axis) and fitted with a linear 
function (straight grey line). Mean gender ratings of female voices ranged from 2.33 (SE = .02) for the lowest AF variants to 6.10 (SE = .06) for the 
highest AF variants. 
doi:1 0.1 371 /journal.pone.0081 022.g003 



Discussion 

The results of the sex identification and gender rating 
experiments show that AF is an important cue for the perception 
of sex and gender in the pre-pubertal human voice, in line with the 
previously reported acoustic dimorphism of this parameter in pre- 
pubertal speakers [8,14,17,32]. More specifically, the absence of a 
sharp boundary between the sex categories in the identification 
experiment, in which listeners were asked to identify the child 
speaker as male or female, suggests that small, sex-related acoustic 
variation in AF proportionally affects the probability of voices to 
be perceived as either male or female by raters. Additionally, the 
gradual slope in voice ratings from "masculine boy" to "feminine 
girl" in the second experiment shows that small linear increments 
in AF also proportionally affect listeners' attributions of speakers' 
gender (from masculinity to femininity). Similar results have been 
reported in studies of gender perception in adult voices. A study 
using a combination of identification and discrimination para- 
digms [29] found that variations along a male-female continuum 
of FO and AF, the main cues to sex in adult voices, were not 



remapped by listeners into separate psychological (male or female) 
categories, indicating that the perception of voice sex was not 
categorical. Moreover, psychoacoustic studies have shown that 
both men's and women's voices with naturally low, or artificially 
lowered, FO and AF (or both), are rated as more masculine 
[23,24,33]. 

In the present study, while the resynthesis continua used for boy 
and girl exemplars were largely overlapping (boys: 1 138-1526 Hz; 
girls: 1129-1542 Hz) and both comprised within the range of AF 
values achievable by both genders before puberty [10,14], the 
effect of the rescaling of AF differed between boy and girl voice 
exemplars, suggesting that the resynthesis of this parameter was 
not sufficient to produce a voice systematically perceived as 
belonging to the opposite sex, despite the standardisation of FO 
and its variation. In the sex identification experiment, the 
perceived sex boundary between male and female identification 
estimated by the logistic model is — 1 00 Hz higher in boy voice 
exemplars than in girl voice exemplars (Figure 2 - vertical lines), 
revealing that a greater upward shift in AF was required for 
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resynthesized stimuli from the voices of the two boy exemplars to 
be perceived as female. The identification curve (Figure 2 - black 
line) for the male exemplars is also shifted downwards relative to 
that of the female exemplars (Figure 3 - black line), with a wider 
plateau at the lower (male) end of the continuum, and no plateau 
at the upper (female) end of the continuum. Further, the boys' 
rating function (Figure 2 - grey straight line) from the gender 
rating experiment is shifted downwards compared to girls', 
revealing that stimuli from boy exemplars were perceived as more 
masculine than those from girl exemplars. One possible explana- 
tion for the observed perceptual differences is that listeners were 
affected by acoustic factors other than those manipulated (AF) or 
factored out (FO and its variation) in the present experiments. For 
example, Klatt & Klatt [34] report that women are perceived to 
have more breathy voices than men, corresponding to increased Fj 
bandwidths and decreased F[ amplitude, while breathy voices are 
judged as more feminine than less-breathy voices [35], suggesting 
that, at least in adults, breathiness may be a contributing factor to 
the perception of sex and gender. The potential role of parameters 
such as FO, FO variation and breathiness [8,34], which are sexually 
dimorphic in adults, but not in pre-pubertal children [13-15], in 
the attribution of sex and gender to children's voices, is an 
important area for future research. 

Independently from other hypothetical voice cues to sex and 
gender attributions of pre-pubertal children's voices, this study 
clearly identifies a substantial effect of AF variation on adults' 
ratings of gender in pre-pubertal speakers, with lower AF being 
consistently rated as belonging to more masculine children. AF 
variation has also been shown to affect judgements of body size 
and age in adult speakers, with listeners rating lower AF as 
belonging to older and larger individuals [36-39]. These 
perceptual differences in turn appear to relate to actual differences 
in age and size of speakers [39-41]. By extending the present 
paradigm to include age and body size ratings, future studies could 
investigate the perceptual linking of age-related size and gender 
dimensions, for example whether children that are perceived to be 
more masculine are also perceived to be older and bigger than 
their more feminine counterparts. Moreover, the use of natural 
(rather than re-synthesised) stimuli from children of different ages, 
body sizes and masculinities (i.e. as assessed by children's personal 
attributes questionnaires [42]), and of raters of different ages, 
would help clarifying the extent to which AF reliably cues for these 
dimensions throughout the lifespan. 



Our observations that baseline AF variation within the natural 
range of children's voices affects listeners' sex and gender 
attributions (despite the absence of a clear anatomical basis for 
such variation) lends further support to the hypothesis that sex and 
gender expression in pre-pubertal children's voices have a strong 
behavioural, acquired dimension (with children learning to adjust 
their VTL in order to sound more or less feminine/masculine). 
Future studies using i.e. structural cine 3D structural MRI are now 
needed to further test this hypothesis. 

Furthermore, it has been shown that children can also 
spontaneously modify AF (and FO) when asked to sound more or 
less like a boy or girl (Cartei, Cowles, Banerjee and Reby, 
unpublished data), suggesting that children can also control the 
gender-related characteristics of their voices. The extent to which 
this ability affects the expression of gender in everyday speech, in 
line with varying gendered roles (i.e. to affiliate with same-sex 
peers) and contexts (i.e. when speaking to a male or female), and its 
perceptual relevance in gendered attributions remains to be 
investigated. 

Supporting Information 

Audio SI This audio file contains three variants derived 
from one of the two girl exemplar voices (exemplar 2), in 
which formant spacing was resynthesized from low 
(longer vocal tract - more masculine sounding voice) to 
high (shorter vocal tract - more feminine sounding 
voice) values (AFs: 88%, 102%, 110%). 
(WAV) 

Audio S2 This audio file contains three variants derived 
from one of the two boy exemplar voices (exemplar 4), in 
which formant spacing was resynthesized from low 
(longer vocal tract - more masculine sounding voice) to 
high (shorter vocal tract - more feminine sounding 
voice) values (AFs: 94%, 104%, 112%). 
(WAV) 
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